I. Introduction

Amid the Coronavirus Disease pandemic in 2020, governments around the world developed a response to aid the citizens of their countries and mitigate the spread of the Severe Acute Respiratory Syndrome Coronavirus 2. This study aims to predict the most recent cumulative number of confirmed cases of COVID-19 in different countries, per 10,000 individuals (on the 23rd of October 2020), using the past government responses in these countries to the outbreak (set on the 15th of June 2020), with some other World Bank indicators of interest. This study builds upon the findings of the previous research (project 2) by considering a nonlinear model.

The data used in this study is obtained from The Humanitarian Data Exchange data portal and includes the total population for each country in 20191, the cumulative number of confirmed cases of COVID-19 in different countries2 on the 23rd of October 2020, the Stringency and Economic Support indices on the 15th of June 20203, the total smoking prevalence of people ages 15 and above in 20164, the number of nurses and midwives per 1,000 people in 20185, and the percentage of population that consists of people ages 15 to 64 in 20196.

When comparing the number of infected individuals across countries, the population size of these countries need to be considered. So, the study will look at the cumulative number of confirmed cases of COVID-19 collected on the 23rd of October 2020 in different countries, per 10,000 individuals, and is calculated as: \(\frac{\text{cumulative cases in the country}}{\text{total population of the country}} \cdot 10,000\). This variable and date are the same as those in Project 2, since the aim is to analyse the same cumulative number of confirmed cases data, but with a different approach.

It was shown in Project 2 that the continuous variables Stringency Index and Economic Support Index, on the 15th of June 2020, are adequate variables that have a relationship with the cumulative number of confirmed cases of COVID-19 per 10,000 collected on the 23rd of October 2020. So, these two indexes will be included in this study to quantify the government response to the outbreak. The Stringency index accounts for closure, containment, and public health measures. The Economic Support index accounts for the economic response taken by the governments. Note that the data used provides different government responses for different regions within certain countries, for example the United States of America. Since this study is looking at a country as a whole, the average government response of a country on the 15th of June 2020 will be used, by taking the average government response of all its regions on that day. The data on the 15th of June 2020 was used, just as in Project 2, since the same relationship between the cumulative number of confirmed cases and government response is being analysed, but with a different approach and in addition to other predictor variables.

Three other predictor variables were chosen to be used in this study. The total smoking prevalence of people ages 15 and above is considered, since smoking involves having a surface (cigarette, pipe,..), which was in contact with a possibly contaminated surface (hand, ashtray,..), come in contact with a mucous membrane (mouth), which is how SARS Coronavirus 2 mainly spreads.7 This variable has no data for ages below 15, but that may not be an issue, since it is less likely to have smokers below 15 years old. The earliest data available is from 2016, so that is what will be analysed. The number of nurses and midwives per 1,000 people in 2018 (earliest data available) is also considered, since nurses are in contact with sick or infected individuals with low immunity, which includes identified and non-identified covid 19 patients. Nurses may accidentally fail to take proper precautions to prevent the spread of the disease from one patient to another, so they play a key role in the spread of the virus. The percentage of the population that consists of people ages 15 to 64 in 2019 (earliest data available) is also considered, since people in this age group are in contact with more people than children and retired individuals, due to college, university, work, socializing opportunities, etc.

Other World Bank indicators were available to choose from, but it was decided not to use them. Some reasons that led to that decision were that the earliest data available is from 2013 or before (proportion of population spending8, etc.); some of the variables have more to do with the number of deaths rather than the spread of the disease (percent of death in a year caused by communicable diseases9, etc.); and other variables for which we could not find a logical method to relate them to the increase in number of infected individuals (Hospital beds10, etc).

After organizing the data and removing countries with missing values, 78 countries remain represented in the dataset, out of the 195 countries in the world (approximately 40%)11. That is about 46.7% of the data used in Project 2, which included 167 countries.

Table 1.Sample for 5 randomly chosen countries of the data set used in this study
Country cumulative_confirmed_cases_per_10000 Stringency_Index Economic_Support_Index Economic_Support_Index_levels
United Arab Emirates 125.144708 72.22 50.0 [50,62.5)
Italy 80.412925 55.56 75.0 [75,87.5)
New Zealand 3.933293 22.22 62.5 [62.5, 75)
Mali 1.751956 52.78 50.0 [50,62.5)
Qatar 461.539222 80.56 37.5 [37.5,50)
Country Population2019 age15_64_population_prop_2019 nurses_midwives_per_1000_2018 Smoking_prevalence_15_2016
United Arab Emirates 9770529 84.13084 5.7271 28.9
Italy 60297396 63.82121 5.7401 23.7
New Zealand 4917000 64.43944 12.4482 16.0
Mali 19658031 50.20018 0.3585 12.3
Qatar 2832067 84.88094 7.2628 20.6

II. Exploratory data analysis


Table 2: Summary for the cumulative confirmed cases per 10,000
n min median mean max sd
78 0.0334753 27.50465 67.20095 461.5392 92.11358

Our total sample size was 78 (Table 2). The mean cumulative confirmed cases (CCC) per 10,000 is about 67.20, far greater than our median 27.50, indicating that our CCC distribution is heavily right-skewed, which can easily be observed in Figure 1. This is to be expected for the lowest CCC possible is 0 whereas there is no such bound for the highest number. Most countries have their CCC within the 300-mark, we also notice the existence of some very extreme cases (outliers) around the 450-mark.

Figure 1. Distribution for the cumulative confirmed cases per 10,000 for individual countries

Figure 1. Distribution for the cumulative confirmed cases per 10,000 for individual countries

The distribution of the Stringency Index (Figure 2), which measures government response, seems to resemble a bell shape although there is a slight skew on the left tail. The Economic Support Index distribution (Figure 3), which records measures such as income support and debt relief, also seems to be a bit left-skewed. We notice that there are two modes at 50 and 75, but suspect that could be due to rounding. In figure 4, the proportion of population does seem slightly left skewed, with 2 outliers around the 85-mark. Figure 5 shows an extremely right-skewed distribution of nurses and midwives. Finally, smoking prevalence for 15+ years olds looks reasonably normally distributed.

Figure 2. Distribution for the government response measured by the Stringency Index

Figure 2. Distribution for the government response measured by the Stringency Index

Figure 3. Distribution for the government response measured by the Economic Support Index

Figure 3. Distribution for the government response measured by the Economic Support Index

Figure 4. Distribution for the Proportion of population that is 15-64 years old, in 2019 for individual countries

Figure 4. Distribution for the Proportion of population that is 15-64 years old, in 2019 for individual countries

Figure 5. Distribution for nurses and midwives per 1000 in 2018 for individual countries

Figure 5. Distribution for nurses and midwives per 1000 in 2018 for individual countries

Figure 6. Distribution for the Smoking Prevalence for 15+ years olds, in 2016 for individual countries

Figure 6. Distribution for the Smoking Prevalence for 15+ years olds, in 2016 for individual countries

In figure 7.1, the scatterplot shows that there seems so be some correlation between the cumulative confirmed cases per 10,000 (CCC) and the Stringency Index, which suggests that, without implying any causal effect, countries with a higher number of cases per 10,000 tend to also have strict policies on pandemic response. It is worth noting that there exist outliers (we consider the one that passes the 400-mark of CCC) that might have more influence on the best fit line. We also included a Loess curve, and it implies an upward trend, before dropping (but please also be cautious about the effect of overfitting). We also notice that for the cases of (almost) 0 CCC for many countries, the response (Stringency Index) diverses the most (from 0 to 100) compared to other levels, with more points clustering in the [50,75] range. This diversity is also true for Economic Support, which suggests that countries with very low CCC also spend a variable amount on income support and debt relief packages. However, countries that have more CCC definitely tend to spend more on said packages.

Figure 7.1. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Stringency Index. The red line is the best fit line. The blue curve is the Loess curve.

The scatter plot in Figure 7.2 for the CCC against Stringency Index grouped by ESI shows drastically different slopes for each interval of ESI, which suggests complex behaviors of the data, which can be better observed in figure 12.

Figure 7.2. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Stringency Index, grouped by the Economic Support Index levels

Figure 7.2. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Stringency Index, grouped by the Economic Support Index levels

The scatter plot in Figure 8 for the CCC against Economic Support Index has more points on the bottom and fewer at the top. This implies that countries with lower cases per 10,000 individuals tend to spend less on economic relief packages.

Figure 8. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Economic Support Index. The red line is the best fit line. The blue curve is the Loess curve.

Figure 9, 10, 11 shows the scatterplots of CCC against the proportion of population that is 15-64 years old, the smoking prevalence of 15+ years old, and nurses and midwives per 1,000. All except smoking prevalence shows some relationship between CCC and the respective predictor.

Figure 9. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their Proportion of population that is 15-64 years old, in 2019. The red line is the best fit line. The blue curve is the Loess curve.

Figure 10. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their Smoking prevalence for 15+ year olds in 2016. The red line is the best fit line. The blue curve is the Loess curve.

Figure 11. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their Service coverage index in 2017. The red line is the best fit line. The blue curve is the Loess curve.

Figure 12. Boxplot of relationship between  the cumulative confirmed cases per 10,000 for individual countries and the Economic Support Index levels

Figure 12. Boxplot of relationship between the cumulative confirmed cases per 10,000 for individual countries and the Economic Support Index levels


III. Multiple linear regression

i. Methods


The last paragraph implied that the slope changes intensely for different intervals of Economic Support Index, thus we recognize that a linear model might not be the best model to capture this complex behavior of the given data, so we decided to make use of the natural spline model.

We will use the transformed Y (by a factor of 0.2), since the previous report showed non-normality in error terms and skewness.

Figure 13. Distribution for the cumulative confirmed cases per 10,000 raised to 0.2, for individual countries

Figure 13. Distribution for the cumulative confirmed cases per 10,000 raised to 0.2, for individual countries

Using natural splines on the following model: \[ \begin{aligned}\widehat{Y}_{CCPTTH}^{0.2} =& b_{0} + b_{SI} \cdot (x_1) + b_{ESI} \cdot (x_2) + b_{15to65 APP} \cdot (x_{3}) \\ & + b_{NM,} \cdot (x_{4}) + b_{SP} \cdot (x_{12}) \end{aligned} \]

From figure 14 - 21, we observe that the distribution of error terms is fixed to more bell-shaped, the normal Q-Q plot shows an almost straight line, and the residual scatter plots (figure 16, 17, 18, 19, 20, 21) is cloud-shaped. We may conclude that the transformation has allowed our assumptions about the model to be reasonably met in order to proceed with our analysis.

Figure 14. Normal Q-Qplot for the model under discussion

Figure 14. Normal Q-Qplot for the model under discussion

Figure 15. Residuals distribution for the statistical model

Figure 15. Residuals distribution for the statistical model

Figure 16. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 16. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 17. Residuals graph for the Stringency Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 17. Residuals graph for the Stringency Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the Economic Support Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 18. Residuals graph for the Economic Support Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 19. Residuals graph for the Proportion of population that is 15-64 years old, in 2019, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 19. Residuals graph for the Proportion of population that is 15-64 years old, in 2019, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 20. Residuals graph for the Smoking prevalence for 15+ year olds in 2016, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 20. Residuals graph for the Smoking prevalence for 15+ year olds in 2016, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 21. Residuals graph for the nurses and midwives per 1000 in 2018, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 21. Residuals graph for the nurses and midwives per 1000 in 2018, with a Lowess curve in blue and a horizontal line at zero in red.

Table 3: Correlation matrix for the numeric variables in the study

CCCPTTH CCCPTTH^0.2 2019 Population SI ESI 15 to 64 y/o 2019 population proportion NM 2018 SP 2016
cumulative_confirmed_cases_per_10000 1.000 0.857 -0.039 0.335 0.121 0.574 0.350 -0.023
cumulative_confirmed_cases_per_10000_transf 0.857 1.000 0.039 0.404 0.165 0.541 0.393 -0.031
Population2019 -0.039 0.039 1.000 0.081 0.035 0.033 -0.082 -0.075
Stringency_Index 0.335 0.404 0.081 1.000 -0.136 0.204 -0.315 -0.261
Economic_Support_Index 0.121 0.165 0.035 -0.136 1.000 0.061 0.293 0.061
age15_64_population_prop_2019 0.574 0.541 0.033 0.204 0.061 1.000 0.324 0.243
nurses_midwives_per_1000_2018 0.350 0.393 -0.082 -0.315 0.293 0.324 1.000 0.325
Smoking_prevalence_15_2016 -0.023 -0.031 -0.075 -0.261 0.061 0.243 0.325 1.000

In table 4, we see that the GVIF value for the variables with 1 degree of freedom each, and the GVIF^(1/(2*Df)) value for the variables with more than 1 degree of freedom each are all between 1 and 5. This indicates that there is moderate correlation between the predictor variables. So, there is not a lot of multicollinearity between the predictor variables, which means the statistical power of the model is not greatly reduced.

Table 4: VIF table

##                                                        GVIF Df GVIF^(1/(2*Df))
## ns(Stringency_Index, knots = c(25, 50, 75))        1.689995  4        1.067790
## Economic_Support_Index                             1.181773  1        1.087094
## ns(age15_64_population_prop_2019, knots = c(67.5)) 1.673899  2        1.137450
## ns(nurses_midwives_per_1000_2018, knots = c(10))   2.150052  2        1.210911
## Smoking_prevalence_15_2016                         1.324239  1        1.150756

ii. Model Results


Table 5. Model Summary Table

## 
## Call:
## lm(formula = cumulative_confirmed_cases_per_10000_transf ~ ns(Stringency_Index, 
##     knots = c(25, 50, 75)) + Economic_Support_Index + ns(age15_64_population_prop_2019, 
##     knots = c(67.5)) + ns(nurses_midwives_per_1000_2018, knots = c(10)) + 
##     Smoking_prevalence_15_2016, data = tidy_joined_dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.37985 -0.23483  0.04806  0.24919  0.95877 
## 
## Coefficients:
##                                                      Estimate Std. Error
## (Intercept)                                         -0.124244   0.507311
## ns(Stringency_Index, knots = c(25, 50, 75))1         0.889603   0.501310
## ns(Stringency_Index, knots = c(25, 50, 75))2         1.518347   0.388205
## ns(Stringency_Index, knots = c(25, 50, 75))3         2.248856   0.978427
## ns(Stringency_Index, knots = c(25, 50, 75))4         1.154782   0.385021
## Economic_Support_Index                               0.002778   0.002301
## ns(age15_64_population_prop_2019, knots = c(67.5))1  1.455291   0.490662
## ns(age15_64_population_prop_2019, knots = c(67.5))2  0.913580   0.376060
## ns(nurses_midwives_per_1000_2018, knots = c(10))1    1.756999   0.388970
## ns(nurses_midwives_per_1000_2018, knots = c(10))2    1.122037   0.402862
## Smoking_prevalence_15_2016                          -0.012081   0.006710
##                                                     t value Pr(>|t|)    
## (Intercept)                                          -0.245 0.807277    
## ns(Stringency_Index, knots = c(25, 50, 75))1          1.775 0.080514 .  
## ns(Stringency_Index, knots = c(25, 50, 75))2          3.911 0.000217 ***
## ns(Stringency_Index, knots = c(25, 50, 75))3          2.298 0.024664 *  
## ns(Stringency_Index, knots = c(25, 50, 75))4          2.999 0.003796 ** 
## Economic_Support_Index                                1.207 0.231524    
## ns(age15_64_population_prop_2019, knots = c(67.5))1   2.966 0.004178 ** 
## ns(age15_64_population_prop_2019, knots = c(67.5))2   2.429 0.017814 *  
## ns(nurses_midwives_per_1000_2018, knots = c(10))1     4.517 2.61e-05 ***
## ns(nurses_midwives_per_1000_2018, knots = c(10))2     2.785 0.006950 ** 
## Smoking_prevalence_15_2016                           -1.801 0.076279 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4869 on 67 degrees of freedom
## Multiple R-squared:  0.5801, Adjusted R-squared:  0.5174 
## F-statistic: 9.256 on 10 and 67 DF,  p-value: 2.047e-09
Table 6. ANOVA Table
Df Sum Sq Mean Sq F value Pr(>F)
ns(Stringency_Index, knots = c(25, 50, 75)) 4 6.6653288 1.6663322 7.030250 0.0000869
Economic_Support_Index 1 2.0693419 2.0693419 8.730547 0.0043147
ns(age15_64_population_prop_2019, knots = c(67.5)) 2 7.4160229 3.7080115 15.644088 0.0000027
ns(nurses_midwives_per_1000_2018, knots = c(10)) 2 5.0192683 2.5096342 10.588138 0.0001010
Smoking_prevalence_15_2016 1 0.7684023 0.7684023 3.241887 0.0762790
Residuals 67 15.8805528 0.2370232 NA NA

iii. Interpreting the regression table

Our model is the following:

\[ \begin{aligned}\widehat{Y}_{CCPTTH}^{0.2} =& b_{0} + b_{SI,0-25} \cdot f_{1}(x_1) + b_{SI,25-50} \cdot f_{2}(x_1) + b_{SI,50-75} \cdot f_{3}(x_1) \\ & + b_{SI,75-100} \cdot f_{4}(x_1) + b_{ESI} \cdot (x_2) + b_{15to65 APP,50-67.5} \cdot f_{5}(x_{3}) \\ & + b_{15to65 APP,67.5-85} \cdot f_{6}(x_{3}) + b_{NM,0-10} \cdot f_{7}(x_{4}) \\ & + b_{NM,10-20} \cdot f_{8}(x_{4}) + b_{SP} \cdot (x_{12}) \\ = & -0.124 + 0.8896 \cdot f_{1}(x_1) + 1.518347 \cdot f_{2}(x_1) + 2.2489 \cdot f_{3}(x_1) \\ & + 1.1548 \cdot f_{4}(x_1) - 0.0028 \cdot (x_2) + 1.4553 \cdot f_{5}(x_{3}) \\ & + 0.9136 \cdot f_{6}(x_{3}) + 1.756999 \cdot f_{7}(x_{4}) \\ & + 1.12204 \cdot f_{8}(x_{4}) - 0.0121 \cdot (x_{12}) \end{aligned} \]

Given the nature of splines, interpretation of the model coefficients is deemed futile as ceteris paribus or other coefficients being held constant is not a possibility to predict the average number of cumulative cases per 10000 transformed to the power of 0.2.

\[\begin{aligned} H_0:&\beta_{SI, 0-25} = 0 \\\ \mbox{vs }H_A:& \beta_{SI, 0-25} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{SI, 25-50} = 0 \\\ \mbox{vs }H_A:& \beta_{SI, 25-50} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{SI, 50-75} = 0 \\\ \mbox{vs }H_A:& \beta_{SI,50-75} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{SI, 75-100} = 0 \\\ \mbox{vs }H_A:& \beta_{SI, 75-100} \neq 0 \end{aligned}\]

\[\begin{aligned} H_0:&\beta_{15to65 APP, 50-67.5} = 0 \\\ \mbox{vs }H_A:& \beta_{15to65 APP, 50-67.5} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{15to65 APP, 67.5-80} = 0 \\\ \mbox{vs }H_A:& \beta_{15to65 APP, 67.5-80} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{NM, 0-10} = 0 \\\ \mbox{vs }H_A:& \beta_{NM, 0-10} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{NM, 10-20} = 0 \\\ \mbox{vs }H_A:& \beta_{NM, 10-20} \neq 0 \end{aligned}\]

However, what our coefficient p-values in table 5 tell us is that the stringency index with knots 25,50,75, age and population with knot 67.5, and nurses and midwives per 1000 share the trait of their levels having a p-value<0.05, leading us to find them helpful in our model for predicting the average number of cumulative cases per 10000 transformed to the power of 0.2.

\[\begin{aligned} H_0:&\beta_{ESI} = 0 \\\ \mbox{vs }H_A:& \beta_{ESI} \neq 0 \end{aligned}\] \[\begin{aligned} H_0:&\beta_{SP} = 0 \\\ \mbox{vs }H_A:& \beta_{SP} \neq 0 \end{aligned}\]

Whereas Economic Stringency Index and Smoking prevalence were found to be insignificant with p-values>0.05.

Seeing the adjusted R-squared of 0.5174 using our model we found that it explains a lot of variability of the average number of cumulative cases per 10000 transformed to the power of 0.2 which, coupled with the significance of the predictors and the low p-value of 2.047e-09 for our model, lead us to believe it is helpful in its explanatory ability.

iv. Inference for multiple regression

The ANOVA table 6 shows if the contribution of each additional variable is significant, when the variables before it in the list are already in the model. Order matters in the ANOVA table.

The Stringency Index with knots at 25,50, and 75 with 4 degrees of freedom keeps adding 6.666 sum of squares. With F value=7.03 and p-value<0.0001, we can conclude that Stringency Index alone in the model explains a significant amount of variability.

The Economic Support Index with 1 degree of freedom keeps adding 2.06 sum of squares. With F value=8.73 and p-value<0.05, we can conclude that the model with Economic Support Index, given that the Stringency Index with knots at 25,50, and 75 in the model, is statistically significant.

The Proportion of the population that is 15-64 years old in 2019 with a knot at 67.5 with 2 degrees of freedom keeps adding 7.416 sum of squares. With an F value=15.644 and p-value<0.05, we can conclude that the Proportion of the population that is 15-64 years old in the model, given that the Stringency Index with knots at 25,50, and 75 and Economic Support Index in the model, is statistically significant.

The nurses and midwives per 1000 in 2018 with a knot at 10 with 2 degrees of freedom keeps adding 5.0192 sum of squares. With a F value=10.588 and p-value<0.05, we can conclude that the nurses and midwives per 1000 in the model, given that the Stringency Index with knots at 25,50, and 75, the Economic Support Index, and the Proportion of population that is 15-64 years old in 2019 with a knot at 67.5 in the model, is statistically significant.

The Smoking prevalence for 15+ year olds in 2016 with 1 degree of freedom keeps adding 0.7684023 sum of squares. With F value=3.24 and p-value=0.07, we can conclude that the nurses and midwives per 1000 in the model, given that the Stringency Index with knots at 25,50, and 75, the Economic Support Index, the Proportion of population that is 15-64 years old in 2019 with a knot at 67.5, and nurses and midwives per 1000 in 2018 in the model, is not statistically significant at a significance level of 0.05.

Figure 22. Interactive Scatterplot for the cumulative confirmed cases per 10,000 (raised to 0.2) for individual countries against their government response measured by the Stringency Index, where economic support index = 50, population proportion of ages 15 to 64 in 2019 = 65, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25. The purple line is the spline, with its associated 95% CI and wider PI.

In figure 22, the Stringency has 3 knots at 25,50,75. As can be observed from the plot, only the 95% CI around knot 25 has 0 \(CCC^{0.2}\) in its range. It means that the coefficient of the Stringency Index might equal to 0.

Figure 23. Interactive Scatterplot for the cumulative confirmed cases per 10,000 (raised to 0.2) for individual countries against their nurses and midwives per 1000 in 2018, where Stringency Index = 50, economic support index = 50, population proportion of ages 15 to 64 in 2019 = 65, and Smoking prevalence for people ages 15+ in 2016 = 25. The purple line is the spline, with its associated 95% CI and wider PI.

In figure 23, the 95% CI does not include 0 in the both ranges (divided by the knot at NM = 10) . We are 95% confident that the coefficient of nurses and midwives per 1000 in 2018 is not 0.

Figure 24. Interactive Scatterplot for the cumulative confirmed cases per 10,000 (raised to 0.2) for individual countries against their Proportion of population that is 15-64 years old, in 2019, where Stringency Index = 50, economic support index = 50, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25. The purple line is the spline, with its associated 95% CI and wider PI.

In figure 24, the Proportion of the population that is 15-64 has one knot at 67.5. The 95% CI does not include 0 in both ranges. We are 95% confident that the coefficients are not equal to 0 in the two ranges.

Figure 25. Interactive Scatterplot for the cumulative confirmed cases per 10,000 (raised to 0.2) for individual countries against their Smoking prevalence for 15+ year olds in 2016, where Stringency Index = 50, economic support index = 50, population proportion of ages 15 to 64 in 2019 = 65,and nurses midwives per 1000 in 2018 = 5. The purple line is the spline, with its associated 95% CI and wider PI.

In figure 25, the 95% CI does not include 0. So, we can conclude that the smoking prevalence for 15+ in 2016 is related to the transformed cumulative confirmed cases per 10,000.

Figure 26. Interactive Scatterplot for the cumulative confirmed cases per 10,000 (raised to 0.2) for individual countries against their government response measured by the Economic Support Index, where Stringency Index = 50, population proportion of ages 15 to 64 in 2019 = 65, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25. The purple line is the spline, with its associated 95% CI and wider PI.

In figure 26, the 95% CI does not include 0. So, we can conclude that the smoking prevalence for 15+ in 2016 is related to the transformed cumulative confirmed cases per 10,000.

Table 7. The 95% Prediction intervals for the cumulative confirmed cases per 10,000, where Stringency Index = 20, 50, 70, 90, respectively, for \((\text{cumulative confirmed cases per 10,000})^{0.2}\) = 2, economic support index = 50, population proportion of ages 15 to 64 in 2019 = 65, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25.

SI Point Estimate Lower Limit Upper Limit
20 0.01993 -1.35387 30.1566
50 11.46058 0.09172 127.5753
70 41.02339 1.66196 284.8189
90 57.59487 2.88898 369.6364

A country with Stringency Index equals to 20, Economic Support Index equal to 50, population proportion of ages 15 to 64 in 2019 = 65, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25. The cumulative confirmed cases per 10,000 is predicted to have a point estimate of 0.01993 cases. A country with Stringency Index equals to 50, Economic Support Index equal to 50, population proportion of ages 15 to 64 in 2019 = 65, nurses midwives per 1000 in 2018 = 5, and Smoking prevalence for people ages 15+ in 2016 = 25. The cumulative confirmed cases per 10,000 are predicted to have a point estimate of 11.46 cases. Similarly, for the SI level = 70, and 90 with the same conditions, the result is predicted in table 7.

IV. Discussion

i. Conclusions

We recognize that interpretability is sometimes to be traded for the sake of a better model. Our analysis shows that the model we proposed seems to be helpful as it explains quite a good amount of variability in cumulative confirmed cases of covid 19 per 10,000 individuals (51.74%).

We see evidence to suggest that CCC is positively correlated with Stringency and Economic Support Index, which aligns with our expectation, for it is reasonable for a government to respond strictly and spend more budget on income support packages if their people are more impacted by the pandemic. Moreover, it is also positively correlated with nurses and midwives and the proportion of 15-64-year-olds in the population, which matches the expectation that we explained in the introduction.

ii. Limitations

This project is limited by the data available. The addition of the three World Bank indicators reduced the countries that were represented in project 2 by about 53.3%, due to excluding countries with a missing value in any of the variables used. Additionally, there were some notable outliers and points with high leverage that could not be removed since they are not mistakes and were necessary, leading to keeping their effects on the model.

The choice to use a non-linear model made the interpretation of the relationship between the variables more complex and less straightforward, which is not a bad thing when used appropriately. However, no test was done to check for overfitting, so the adequacy of the complexity of the model cannot be determined.

iii. Further questions

The relationship studied in this report is not one that can be generalized to other sets of dates, since no test was done to check its generalizability. This point can be further explored in another study. Moreover, another study can be done where the aim is the same as this study, but uses methods other than regression analysis.


V. Citations and References


  1. “Total Population” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  2. “time_series_covid19_confirmed_global.csv” Novel Coronavirus (COVID-19) Cases Data. COVID-19 Pandemic. Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE). United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases↩︎

  3. “OxCGRT_CSV” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/oxford-covid-19-government-response-tracker↩︎

  4. “Health - Smoking prevalence, total, ages 15+” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  5. “Health - Nurses and midwives (per 1,000 people)” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  6. “Age and Population - Population ages 15-64 (% of total population))” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  7. “Coronavirus disease(COVID-19): Prevention and risks.” Government of Canada. Accessed: November 2020 https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/prevention-risks.html↩︎

  8. “Health - Proportion of population spending” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  9. “Health - Cause of Death, by communicable diseases” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  10. “Health - Hospital beds per 1,000 people” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed November 2020 https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak↩︎

  11. “How many Countries are there in the World?”, Worldometer, 2020. Accessed October 2020 https://www.worldometers.info/geography/how-many-countries-are-there-in-the-world/↩︎